NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Ghosh, Avrajit; Kwon, Soo Min; Wang, Rongrong; Ravishankar, Saiprasad; Qu, Qing (May 2025, International Conference on Learning Representations)

Free, publicly-accessible full text available May 1, 2026
Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Ghosh, Avrajit; Kwon, Soo Min; Wang, Rongrong; Ravishankar, Saiprasad; Qu, Qing (March 2025, International Conference on Learning Representations)

Deep neural networks trained using gradient descent with a fixed learning rate eta often operate in the regime of ``edge of stability'' (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold 2/eta. In this work, we present a fine-grained analysis of the learning dynamics of (deep) linear networks (DLNs) within the deep matrix factorization loss beyond EOS. For DLNs, loss oscillations beyond EOS follow a period-doubling route to chaos. We theoretically analyze the regime of the 2-period orbit and show that the loss oscillations occur within a small subspace, with the dimension of the subspace precisely characterized by the learning rate. The crux of our analysis lies in showing that the symmetry-induced conservation law for gradient flow, defined as the balancing gap among the singular values across layers, breaks at EOS and decays monotonically to zero. Overall, our results contribute to explaining two key phenomena in deep networks: (i) shallow models and simple tasks do not always exhibit EOS; and (ii) oscillations occur within top features}. We present experiments to support our theory, along with examples demonstrating how these phenomena occur in nonlinear networks and how they differ from those which have benign landscapes such as in DLNs.
more » « less
Free, publicly-accessible full text available March 25, 2026
Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Ghosh, Avrajit; Kwon, Soo Min; Wang, Rongrong; Ravishankar, Saiprasad; Qu, Qing (March 2025, The Thirteenth International Conference on Learning Representations)

Free, publicly-accessible full text available March 5, 2026
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference

Lee, Changwoo; Kwon, Soo Min; Qu, Qing; Kim, Hun-Seok (December 2024, Advances in Neural Information Processing Systems)

Large-scale foundation models have demonstrated exceptional performance in language and vision tasks. However, the numerous dense matrix-vector operations involved in these large networks pose significant computational challenges during inference. To address these challenges, we introduce the Block-Level Adaptive STructured (BLAST) matrix, designed to learn and leverage efficient structures prevalent in the weight matrices of linear layers within deep learning models. Compared to existing structured matrices, the BLAST matrix offers substantial flexibility, as it can represent various types of structures that are either learned from data or computed from pre-existing weight matrices. We demonstrate the efficiency of using the BLAST matrix for compressing both language and vision tasks, showing that (i) for medium-sized models such as ViT and GPT-2, training with BLAST weights boosts performance while reducing complexity by 70% and 40%, respectively; and (ii) for large foundation models such as Llama-7B and DiT-XL, the BLAST matrix achieves a 2x compression while exhibiting the lowest performance degradation among all tested structured matrices. Our code is available at https://github.com/changwoolee/BLAST.
more » « less
Full Text Available
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference

Lee, Changwoo; Kwon, Soo Min; Qu, Qing; Kim, Hun-Seok (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
Efficient Low-Dimensional Compression of Overparameterized Models

Kwon, Soo Min; Zhang, Zekai; Song, Dogyoon; Balzano, Laura; Qu, Qing (May 2024, International Conference on Artificial Intelligence and Statistics)

In this work, we present a novel approach for compressing overparameterized models, developed through studying their learning dynamics. We observe that for many deep models, updates to the weight matrices occur within a low-dimensional invariant subspace. For deep linear models, we demonstrate that their principal components are fitted incrementally within a small subspace, and use these insights to propose a compression algorithm for deep linear networks that involve decreasing the width of their intermediate layers. We empirically evaluate the effectiveness of our compression technique on matrix recovery problems. Remarkably, by using an initialization that exploits the structure of the problem, we observe that our compressed network converges faster than the original network, consistently yielding smaller recovery errors. We substantiate this observation by developing a theory focused on deep matrix factorization. Finally, we empirically demonstrate how our compressed model has the potential to improve the utility of deep nonlinear models. Overall, our algorithm improves the training efficiency by more than 2x, without compromising generalization.
more » « less
Full Text Available
Efficient Low-Dimensional Compression of Overparameterized Models

Kwon, Soo Min; Zhang, Zekai; Song, Dogyoon; Balzano, Laura; Qu, Qing (May 2024, Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024,)

Full Text Available
Solving Inverse Problems with Latent Diffusion Models via Hard Data Consistency

Song, Bowen; Kwon, Soo Min; Zhang, Zecheng; Hu, Xinyu; Qu, Qing; Shen, Liyue (May 2024, International Conference on Learning Representations)

Latent diffusion models have been demonstrated to generate high-quality images, while offering efficiency in model training compared to diffusion models operating in the pixel space. However, incorporating latent diffusion models to solve inverse problems remains a challenging problem due to the nonlinearity of the encoder and decoder. To address these issues, we propose ReSample, an algorithm that can solve general inverse problems with pre-trained latent diffusion models. Our algorithm incorporates data consistency by solving an optimization problem during the reverse sampling process, a concept that we term as hard data consistency. Upon solving this optimization problem, we propose a novel resampling scheme to map the measurement-consistent sample back onto the noisy data manifold and theoretically demonstrate its benefits. Lastly, we apply our algorithm to solve a wide range of linear and nonlinear inverse problems in both natural and medical images, demonstrating that our approach outperforms existing state-of-the-art approaches, including those based on pixel-space diffusion models.
more » « less
Full Text Available
Low-Rank Phase Retrieval with Structured Tensor Models

https://doi.org/10.1109/ICASSP43922.2022.9746452

Kwon, Soo Min; Li, Xin; Sarwate, Anand D. (May 2022, 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

We study the low-rank phase retrieval problem, where the objective is to recover a sequence of signals (typically images) given the magnitude of linear measurements of those signals. Existing solutions involve recovering a matrix constructed by vectorizing and stacking each image. These solutions model this matrix to be low-rank and leverage the low-rank property to decrease the sample complexity required for accurate recovery. However, when the number of available measurements is more limited, these low-rank matrix models can often fail. We propose an algorithm called Tucker-Structured Phase Retrieval (TSPR) that models the sequence of images as a tensor rather than a matrix that we factorize using the Tucker decomposition. This factorization reduces the number of parameters that need to be estimated, allowing for a more accurate reconstruction. We demonstrate the effectiveness of our approach on real video datasets under several different measurement models.
more » « less
Full Text Available
Learning Predictors from Multidimensional Data with Tensor Factorizations

https://doi.org/10.14713/arestyrurj.v1i3.165

Kwon, Soo Min; Sarwate, Anand D. (October 2021, Aresty Rutgers Undergraduate Research Journal)

Statistical machine learning algorithms often involve learning a linear relationship between dependent and independent variables. This relationship is modeled as a vector of numerical values, commonly referred to as weights or predictors. These weights allow us to make predictions, and the quality of these weights influence the accuracy of our predictions. However, when the dependent variable inherently possesses a more complex, multidimensional structure, it becomes increasingly difficult to model the relationship with a vector. In this paper, we address this issue by investigating machine learning classification algorithms with multidimensional (tensor) structure. By imposing tensor factorizations on the predictors, we can better model the relationship, as the predictors would take the form of the data in question. We empirically show that our approach works more efficiently than the traditional machine learning method when the data possesses both an exact and an approximate tensor structure. Additionally, we show that estimating predictors with these factorizations also allow us to solve for fewer parameters, making computation more feasible for multidimensional data.
more » « less
Full Text Available

« Prev Next »

Search for: All records